Cluster Description Formats, Problems and Algorithms
نویسندگان
چکیده
Clustering is one of the major data mining tasks. So far, the database and data mining literature lacks systematic study of cluster descriptions, which are essential to provide the user with understandable knowledge of the clusters and support further interactive exploration. In this paper, we introduce novel description formats leading to more descriptive power. We define two alternative problems of generating cluster descriptions, Minimum Description Length and Maximum Description Accuracy, providing different trade-offs between interpretability and accuracy. We also present heuristic algorithms for both problems, together with their empirical evaluation and comparison to state-of-the-art algorithms.
منابع مشابه
Hyper-rectangle-based Discriminative Data Generalization and Applications in Data Mining
The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Axis-parallel hyper-rectangles provide interpretable generalizations for multi-dimensional data points with numerical attributes. In this dissertation, we study the fundamental problem of rectangle-ba...
متن کاملNew Heuristic Algorithms for Solving Single-Vehicle and Multi-Vehicle Generalized Traveling Salesman Problems (GTSP)
Among numerous NP-hard problems, the Traveling Salesman Problem (TSP) has been one of the most explored, yet unknown one. Even a minor modification changes the problem’s status, calling for a different solution. The Generalized Traveling Salesman Problem (GTSP)expands the TSP to a much more complicated form, replacing single nodes with a group or cluster of nodes, where the objective is to fi...
متن کاملDL-Learner: Learning Concepts in Description Logics
In this paper, we introduce DL-Learner, a framework for learning in description logics and OWL. OWL is the official W3C standard ontology language for the Semantic Web. Concepts in this language can be learned for constructing and maintaining OWL ontologies or for solving problems similar to those in Inductive Logic Programming. DL-Learner includes several learning algorithms, support for diffe...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملخوشهبندی خودکار دادهها با بهرهگیری از الگوریتم رقابت استعماری بهبودیافته
Imperialist Competitive Algorithm (ICA) is considered as a prime meta-heuristic algorithm to find the general optimal solution in optimization problems. This paper presents a use of ICA for automatic clustering of huge unlabeled data sets. By using proper structure for each of the chromosomes and the ICA, at run time, the suggested method (ACICA) finds the optimum number of clusters while optim...
متن کامل